NGSeasy: a next generation sequencing pipeline in Docker containers [version 1; referees: 3 approved with reservations]

نویسندگان

  • Fabien Campagne
  • Brad Chapman
  • Michael Barton
  • Amos A Folarin
  • Richard JB Dobson
  • Stephen J Newhouse
چکیده

Bioinformatic pipelines often use large numbers of components Motivation and deploying them incurs substantial configuration and maintenance burden that remains a significant barrier to reproducible research. Our aim is to define a new paradigm and best practices for developing, distributing and running pipelines encapsulated in Docker containers (lightweight virtualization), with a focus on next generation sequencing (NGS) workflows. This approach provides several advantages, namely: efficiency, portability, versioning and reproducibility. Using the NGSeasy pipeline, a user can quickly deploy any pipeline version in any environment (e.g. operating systems, workstations, clusters, clouds). While this might also be achieved with a virtual machine (VM); VMs lack portability, have substantial overhead (disk, CPU, RAM), and require allocated resources to be provisioned statically – Docker, to a large extent, solves these issues. : We demonstrate best practices for packaging and execution of a Results multicomponent pipeline for NGS using a set of container building blocks which are versioned, modular and reusable. We present a basic ”proof of concept” evaluation of a next generation sequencing pipeline in Docker containers, capable of producing meaningful results, that are comparable with public and ”best practice” workflows, with little to no impact on standard computing performance. : Both versioned Dockerfiles and container images for each Availability component are published on GitHub and Docker Hub, respectively. The pipeline and containers can be pulled from Docker Hub and executed on any environment capable of running the Docker platform with minimum hardware requirements for running an NGS pipeline. This article is included in the Container channel. Virtualization in Bioinformatics 1,2 1,2

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Kvik: three-tier data exploration tools for flexible analysis of genomic data in epidemiological studies [version 1; referees: 2 approved with reservations]

Kvik is an open-source system that we developed for explorative analysis of functional genomics data from large epidemiological studies. Creating such studies requires a significant amount of time and resources. It is therefore usual to reuse the data from one study for several research projects. Often each project requires implementing new analysis code, integration with specific knowledge bas...

متن کامل

Identification of Equid herpesvirus 2 in tissue-engineered equine tendon [version 2; referees: 2 approved with reservations]

Incidental findings of virus-like particles were identified following Background: electron microscopy of tissue-engineered tendon constructs (TETC) derived from equine tenocytes. We set out to determine the nature of these particles, as there are few studies which identify virus in tendons , and their presence could have per se implications for tissue-engineering using allogenic grafts. Virus p...

متن کامل

Disambiguate: An open-source application for disambiguating two species in next generation sequencing data from grafted samples [version 2; referees: 3 approved]

Grafting of cell lines and primary tumours is a crucial step in the drug development process between cell line studies and clinical trials. Disambiguate is a program for computationally separating the sequencing reads of two species derived from grafted samples. operates on DNA or Disambiguate RNA-seq alignments to the two species and separates the components at very high sensitivity and specif...

متن کامل

Towards a More Reliable and Available Docker-based Container Cloud

Operating System-level virtualization technology, or containers as they are commonly known, represents the next generation of light-weight virtualization, and is primarily represented by Docker. However, Docker’s current design does not complement the SLAs from Docker-based container cloud offerings promising both reliability and high availability. The tight coupling between the containers and ...

متن کامل

Bio-Docklets: virtualization containers for single-step execution of NGS pipelines

Processing of next-generation sequencing (NGS) data requires significant technical skills, involving installation, configuration, and execution of bioinformatics data pipelines, in addition to specialized postanalysis visualization and data mining software. In order to address some of these challenges, developers have leveraged virtualization containers toward seamless deployment of preconfigur...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016